Metadata-Version: 2.4
Name: dae
Version: 0.1.0
Summary: Diffusion-Aligned Embeddings: weighted k-NN graph construction and CTMC embedding optimizer
Author-email: DAE Authors <anon@example.com>
License: MIT License
        
        Copyright (c) 2025 DAE Authors
        
        Permission is hereby granted, free of charge, to any person obtaining a copy
        of this software and associated documentation files (the "Software"), to deal
        in the Software without restriction, including without limitation the rights
        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
        copies of the Software, and to permit persons to whom the Software is
        furnished to do so, subject to the following conditions:
        
        The above copyright notice and this permission notice shall be included in all
        copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
        SOFTWARE.
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy>=1.21
Requires-Dist: numba>=0.55
Requires-Dist: scipy>=1.7
Requires-Dist: scikit-learn>=1.0
Provides-Extra: faiss
Requires-Dist: faiss-cpu>=1.7; extra == "faiss"
Provides-Extra: hnswlib
Requires-Dist: hnswlib>=0.7; extra == "hnswlib"
Provides-Extra: annoy
Requires-Dist: annoy>=1.17; extra == "annoy"
Provides-Extra: pynndescent
Requires-Dist: pynndescent>=0.5; extra == "pynndescent"
Provides-Extra: sympy
Requires-Dist: sympy>=1.10; extra == "sympy"
Dynamic: license-file

# Diffusion‑Aligned Embeddings (DAE)

This repository contains a reference implementation of the Diffusion‑Aligned
Embeddings (DAE) algorithm.  DAE is a method for constructing weighted
affinity graphs with self‑tuning bandwidths and for computing embeddings via
continuous‑time Markov chain (CTMC) dynamics.  It supports a variety of
distance metrics, kernel families and k‑NN backends, and can be accelerated
with optional libraries such as FAISS, hnswlib, Annoy or pynndescent.

## Features

* Build a weighted k‑nearest neighbor graph with per‑sample bandwidths
  following the UMAP self‑tuning procedure.
* Choose from several symmetrization rules (mean, max, min, geometric,
  harmonic, UMAP) to fuse directed edge weights into an undirected graph.
* Use fast k‑NN search backends including FAISS, hnswlib, Annoy or
  pynndescent when available, or fall back to scikit‑learn.
* Select distance metrics (Euclidean, cosine, Mahalanobis) and register
  custom metrics at runtime.
* Choose kernel families (UMAP heavy‑tail, Student‑t, exponential, power‑law
  and more) or provide your own numba‑compiled kernel.
* Compute diffusion embeddings with a continuous‑time Markov chain optimizer
  (see `ctmc_engine.py`).

## Installation

Install the core package from source using pip:

```bash
pip install .
```

The core dependencies are `numpy`, `scipy`, `numba` and `scikit‑learn`.  To
enable optional k‑NN backends or symbolic kernel expressions, install one or
more of the extra sets:

```bash
pip install .[faiss]        # FAISS accelerated k‑NN
pip install .[hnswlib]      # hnswlib accelerated k‑NN
pip install .[annoy]        # Annoy accelerated k‑NN
pip install .[pynndescent]  # pynndescent accelerated k‑NN
pip install .[sympy]        # symbolic kernel expressions
```

## Quick start

Below is a minimal example that constructs a weighted k‑NN graph from
synthetic data and prints the number of edges:

```python
import numpy as np
from dae.graph_utils import build_weighted_graph

# Generate synthetic data
X = np.random.randn(100, 5).astype(np.float32)

# Build a 10‑NN graph using the default UMAP kernel and 'max'
# symmetrization.  Returns rho offsets, per‑sample sigmas, symmetric
# edge indices (ei, ej), symmetric weights P_vals and the raw neighbor
# indices.
rho, sigmas, ei, ej, P_vals, neigh_idx = build_weighted_graph(
    X, k=10, symmetrize="max"
)

print(f"Constructed graph with {len(P_vals)} undirected edges")
```

See the docstrings in `graph_utils/graph_builder.py` and
`ctmc_engine.py` for descriptions of all parameters and return values.

## Citing

If you use this code in your research please cite the DAE paper (insert
appropriate citation here).

## License

This project is licensed under the MIT License.  See the `LICENSE` file for
details.
